|
Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework. The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (MapReduce). Hadoop splits files into large blocks and distributes them amongst the nodes in the cluster. To process the data, Hadoop MapReduce transfers packaged code for nodes to process in parallel, based on the data each node needs to process. This approach takes advantage of data locality—nodes manipulating the data that they have on hand—to allow the data to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking. The base Apache Hadoop framework is composed of the following modules: *''Hadoop Common'' – contains libraries and utilities needed by other Hadoop modules; *''Hadoop Distributed File System (HDFS)'' – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster; *''Hadoop YARN'' – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications; and *''Hadoop MapReduce'' – a programming model for large scale data processing. The term "Hadoop" has come to refer not just to the base modules above, but also to the "ecosystem", or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Impala, Apache Flume, Apache Sqoop, Apache Oozie, Apache Storm and others.〔(【引用サイトリンク】title=Hadoop-related projects at )〕 Apache Hadoop's MapReduce and HDFS components were inspired by Google papers on their MapReduce and Google File System. The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as Shell script. For end-users, though MapReduce Java code is common, any programming language can be used with "Hadoop Streaming" to implement the "map" and "reduce" parts of the user's program. Other related projects expose other higher-level user interfaces. ==History== Hadoop was created by Doug Cutting and Mike Cafarella〔(【引用サイトリンク】title=Michael J. Cafarella )〕 in 2005. Cutting, who was working at Yahoo! at the time,〔(Hadoop creator goes to Cloudera )〕 named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.〔"Hadoop contains the distributed computing platform that was formerly a part of Nutch. This includes the Hadoop Distributed Filesystem (HDFS) and an implementation of MapReduce." (About Hadoop )〕 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Apache Hadoop」の詳細全文を読む スポンサード リンク
|